# **Towards Generalizable Multi-Policy Optimization with Self-Evolution for Job Scheduling**

### **Abstract:**

---

Reinforcement Learning (RL) has shown promising results in solving Job Scheduling Problems (JSPs), automatically deriving powerful dispatching rules from data without relying on expert knowledge. However, most RL-based methods rely on single policy strategies, which significantly limit their ability to explore diverse scheduling behaviors. Moreover, designing tailored reward functions for each JSP variant remains a challenging and labor-intensive task. To address these issues, we introduce a generic learning framework that optimizes multiple policies sharing a common objective and a single neural network, yet each policy can learn specialized and complementary problem-solving strategies.  This optimization process is completely derived from autonomously generated self-teaching labels, eliminating the need for manually crafted reward functions. In addition, we develop a training scheme that adaptively controls the imitation degree to reflect the quality of self-labels and enhances sample efficiency. Experimental results show that our method successfully addresses the aforementioned challenges, significantly outperforming previous state-of-the-art methods across five JSP variants. Furthermore, our approach demonstrates remarkable effectiveness on other combinatorial optimization problems, highlighting its broad applicability and generality beyond JSPs.

### **Dependencies:**

---
- `Python=3.10.13`
- `torch==2.2.2`
- `torch_geometric==2.5.2`
- `numpy==1.24.3`
- `pytz==2024.1`

### **Benchmark Problems:**

---

* Single Machine Scheduling Problem (SMSP). Please refer to `SMSP/` file.
* Unrelated Parallel Machine Scheduling Problem (UPMSP). Please refer to `UPMSP/` file.
* Permutation Flow shop Scheduling Problem (PFSP). Please refer to `PFSP/` file.
* Flexible Flow shop Scheduling Problem (FFSP). Please refer to `FFSP/` file.
* Job Shop Scheduling Problem (JSSP). Please refer to `JSSP/` file.